Table of Contents

  1. Introduction
  2. Data preprocessing
    1. Importing the data
    2. Creating data columns
    3. Census Data
      1. New York City Population by Borough, 1950 - 2040
  3. Exploratory Data Analysis
    1. Info and Summary
    2. Pair Plot
    3. Correlation Matrix
    4. Racial and ethnic diversity of NYC boroughs
      1. Absolute population values
      2. Relative population values
    5. Population of NYC over time
  4. Complaints per Officer and per Precinct
    1. Absolute number and relative proportion of complaints
    2. Complaints broken down by race
      1. Race of complainants
      2. Race of officers
  5. Complaints over the years
    1. Absolute number and relative proportion of complaints
    2. Absolute and relative number of Complaints per borough
      1. Absolute number of complaints
      2. Relative number of complaints
  6. Officers with most complaints
    1. First Officer
    2. Second Officer
    3. Officers with top 5% of complaints
  7. Conclusion

Civilian Complaints Against NYC Police Officers from 1985-2020

DATS 6103 Individual Project 3

Izzy Illari (GWID: G38518463)

Introduction

This data has been taken from the following kaggle dataset which states:

After New York state repealed the statute that kept police disciplinary records secret, known as 50-a, ProPublica filed a records request with New York City’s Civilian Complaint Review Board, which investigates complaints by the public about NYPD officers. The board provided us with records about closed cases for every police officer still on the force as of late June 2020 who had at least one substantiated allegation against them. The records span decades, from September 1985 to January 2020.

Each record lists:

  1. name,
  2. shield number,
  3. rank of officer as of today and at the time of the incident,
  4. precinct of officer as of today and at the time of the incident,
  5. age, race, and gender of the complainant,
  6. age, race, and gender of the officer,
  7. category describing the alleged misconduct,
  8. whether the CCRB concluded the officers’ conduct violated NYPD rules

Currently in the USA we are experiencing another surge of BLM protests that echoes the movements that started back during the Obama administration in 2013. The BLM movement speaks out against the police brutality and systemic racism that has caused the deaths of Black Americans such as Trayvon Martin, Sandra Bland, Tamir Rice, Eric Garner, George Floyd, Breonna Taylor, and many more. The BLM movement has brought the public's attention to the thousands of violent incidents that happen to Black Americans that are not seen and not heard. For incidents that are recorded, we can turn to police reports.

Although these police reports are insufficient in analyzing the sheer breadth of police encounters that Black Americans must endure with on a daily basis, this does give us a place to start: do the numbers prove what the BLM movement states, and are Black Americans forced to endure more incidents with the police than any other racial and ethnic groups?

For this project I focus on the number one most populous city in the USA: NYC. NYC is incredibly racially diverse, and each of the five boroughs have different racial and ethnic populations. It is the aim of this project to analyze the available data of incidents with the NYPD and see if any patterns arise from the data that might support the conclusion that Black Americans are disproportionately targeted by the police.

Data preprocessing

Importing the data

For the CCRB Data Layout Table we will need to import the individual sheets from the excel file.import os

The CCRB Data Layout Table is needed to explain some of the abbreviations and categories in allegations_202007271729.csv, which is why we have imported it.

Creating data columns

Now we're going to create a few new columns. One of these columns will be a full name. Another will be the full date the complaint was received and the full date the incident was closed.

There is an odd case where a few of the offices have their precinct labelled as 1000 or 0. There are only 77 police precincts in NYC. The highest number is the 123rd Precinct on Staten Island and the smallest number is the 1st Precinct in Manhattan. I'm going to remove row where precinct == 1000 and precinct == 0 because I am not sure which precinct this is supposed to be refering to (it could be an internal number being used to reference a specific entity within the NYPD structure).

Now this makes sense! We see that by removing 1000 and 0 we are left with the 123rd Precinct and the 1st Precinct, which is what we expected in the first place.

I'm going to make a new column for borough, which will represent the df.name.unique() in which the precinct of the complain was filed.

Okay, it looks like that worked! We now have the borough associated with the precincts.

Census Data

Now let's import the County Population by Characteristics: 2010-2019 from The US Census. This data contains the Annual County Resident Population Estimates by Age, Sex, Race, and Hispanic Origin from April 1, 2010 to July 1, 2019 (cc-est2019-alldata-36.csv). This csv file is going to need serious reformatting to get the information that I want, but it will be interesting to see the demographics of the boroughs and what conclusions we can draw when compared to the NYPC officer complaints.

According to the key for the YEAR variable, a year of 12 means a 7/1/2019 population estimate. We will only look at the 2019 estimates. We will also only be looking at the 5 boroughs. However, this data is sorted by county name, meaning we will need to look at the Bronx County (Bronx), Kings County (Brooklyn), Queens County (Queens), New York County (Manhattan), and Richmond County (Staten Island). An Age Group of 0 is the population across all ages, which will also be the only age group that we will keep. There are a few columns that I will be combing as well, but first let's reduce the rows before tackling the columns.

Okay! We've reduced the rows from 14136 to just 5: the 5 boroughs for population esitmates in 2019 for all age groups. Now I need to combine certain columns. For example, they have WAC_MALE/WAC_FEMALE (White alone or in combination male/female population), and WA_MALE/WA_FEMALE (White alone male/female population). i only want the alone or in combination for every race. I will also be creating a column that is the total alone and combined population for both male and female for each race. We also have columns for Not-Hispanic and Hispanic. This could be useful information to see if, perhaps, individuals that are Hispanic are targeted more, but the original complaints data includes race but not if a person is Hispanic or not. I will exclude these columns.

It looks like every alone or combined population column has AC_ in the title, which is what I'll use to filter the columns. The not Hispanic columns start with NH and the Hispanic columns start with H. I'll also exclude those.

We need to also keep a few of the other columns in our data, such as the county name.

Great! That worked. We now have only the columns and rows we need from the census data.

New York City Population by Borough, 1950 - 2040

The Department of City Planning (DCP) in the city of New York has provided the unadjusted decennial census data from 1950-2000 and projected figures from 2010-2040: summary table of New York City population numbers and percentage share by Borough, including school-age (5 to 17), 65 and Over, and total population. This data can be found on the city of New York website. The data was last updated on February 7, 2020, and does not include the racial breakdown of the population but will be useful for when we check how the number of complaints changes over the years in comparison to the population changes.

This data is terribly formatted, we're going to need to adjust it alot.

I want to create a "normalized" population value, where I will take the 1980 values and essentially set those to "zero", or my base line. Then I will take the population as a change in population from 1980, rather than an absolute value.

Great, we've got our absolute population for NYC and her boroughs from 1980-2020 as well as the change in population from 1980 levels. I'm also going to want the percentages as well to combine to this.

Great, now we have all the population data we'll need (for now). Any additional changes (such as taking the log of values) will be performed later.

Exploratory Data Analysis

Info and Summary

Let's take a look at the data! We'll be looking at a summary of the data as well as some pairplots and correlation matrices, just to get a feel for what we're working with.

Pair Plot

Let's see if anything interesting can be observed in a pairplot of the allegation data.

This pairsplot is not great to look at. We see that most of these compalaints are made about Police Officers (as opposed to Sergeant or Lieutenant).

Correlation Matrix

Let's see if anything interesting can be observed in a correlation matrix of the allegation data.

It doesn't look like there are any surprising correlations in this data. We have that the year_received and year_closed are highly correlated, and that the complaint_id is highly correlated with the year_closed, leading me to believe that the complaint_id must be a number that involves the year the complaint is closed. However, for the most part, the correlations amongst the features are fairly weak.

Racial and ethnic diversity of NYC boroughs

Absolute population values

Since we have the data let's change the racial and ethnic diversity of the five NYC boroughs. First we will look at the raw population values.

We can see that Brooklyn (~2.5 million) and Queens (~2.3 million) have the highest populations and that Staten Island (~0.5 million) has the lowest. Brooklyn has the most white and black people of all the boroughs, whereas Queens has the most Asiand and Native Hawaiian individuals. Brooklyn and Queens appear to be pretty much tied for the number of people that are two or more races, whereas the Bronx has the most Native American individuals.

Relative population values

Let's look at the relative population of the boroughs for racial and ethnic diversity. This will require me to make a few new columns in the data.

With our relative populations we can plot these results.

This paints a different picture than the raw/absolute population values. We can see that even though Staten Island has the smallest absolute population, it has the highest proportion of white people, as opposed to how Brooklyn had the highest absolute population of whites. Queens still has (propostionally) the most Asian individuals, but now we see that the Bronx leads in black, Native Hawaiian, and Two or more Races categories. We see that proportionally Manhattan is ~70% white, and does not lead in any other category of race. We see that the Bronx is fairly diverse, it has the smallest proportional white population but does very well in every other category (except for Asians). We also see that Brooklyn is fairly diverse as well. with large proportional black and mixed race groups.

It will be interesting to see which boroughs have the most amount of complaints. We expect that the more racially diverse boroughs have large numbers of police complaints, whereas the whiter boroughs might have fewer complaints against the NYPD.

Population of NYC over time

Let's look at how the population in NYC has changed over time. This will be usefull when looking at the change in the number of complaints.

In terms of population growth each borough increases at a slower rate than the total of NYC, which makes sense. The log of the population was included to tease out the behavior of the boroughs bunched together with populations of ~1 million. It seems that most of the boroughs have pretty much been around the same percentage of NYC's total population, in that Brooklyn is consistently over 30% of NYC total, Queens is around 27%, Statent Island stays around 5% and so on. The biggest observable change is if we compare the population change from 1980 to 2020. If we take the 1980 population as the "base" value and normalize the other years by this value, we can find a change from the 1980 Population. We see that in 2020 the Bronx is at almost 1.4x that of what it was in 1980. The Bronx has had the most dramatic change in population from 1980, Queens experienced a small spike in 2000, and Manhattan has been noticeably linear in its growth.

We will see how the number of complaints from the boroughs compare to the population increases. Naturally one would imagine that the number of complaints would increase as the population increases, However, we are looking to see if any of the boroughs have a number of complaints that is wildly different than their population growth.

Complaints per Officer and per Precinct

Absolute number and relative proportion of complaints

Let's look at some of the values like unique_mos_id and precinct and see just the counts of them first. It would be interesting to see, for example, if one particular officer or precinct has a particularly high number of filed complaints as compared to the others.

From looking at the above histograms I can see that the officers with the most complaints have an ID of roughly 20000-25000. It appears that these offices get a lot of complaints in the Bronx and Brooklyn. For the precincts we see that 40-50 (which are in the Bronx) and ~65-80 (which are in Brooklyn) have the most complaints. This seems to be the same issue as for where the cops get the complaints. It also appears that several cops receive much more complaints than other officers.

Interestingly enough Queens has relatively few complaints despite its large non-white Asian population, where as Staten Island has more complaints despite its high proportion of whites. Manhattan has the least complaints, and is also majority white and incredibly wealthy.

Complaints broken down by race

In the above we looked at how the complaints broke down by borough, but the data does include the various features of both the officer and the complainant (age, gender, race).

Race of complainants

Let's look to see who is filing the most complaints against the NYPD, and thus the individuals with the most exposure to and incidents with the NYPD.

Indeed, the people filing the most complaints against the NYPD are black individuals, followed by Hispanic individuals. Even in boroughs that are majority white by large margins (Manhattan and Staten Island) we see that black individuals file the most incident complaints in every borough. Asian individuals file some of the smallest numbers of complaints, which might explain why Queens, with such a large population and such a large Asian community, files much less complaints than you might expect for its population (see section below for details).

Race of officers

Let's look to see who the officers that receive the complaints are.

By and large police officers and precincts that receive complaints are white, with the next largest group being hispanic. Black individuals account for very few police officers.

Complaints over the years

Absolute number and relative proportion of complaints

Let's look at how the number of complaints each year has changed. But first, in order to do so, I'll need to create new data from which to plot.

We see that as we move away from 1985 the number of complaints against the NYPD increases. The most requests received occured in 2016, and the most request closed occured in 2015. The BLM protests started back in 2013, which you might have expected to lead to an increase in complaints against the NYPD, and that does indeed appear to be the case. It seems the complaints escalated to a peak right before and during the election year. There is another peak in complaints in 2019, the start of a new wave of ongoing BLM protests, and 2020 rates of requests being received and closed might be so low due to the COVID-19 pandemic and quarantine restrictions.

What I do find interesting is that for the most part the number of closed complaints followed the trend of received complaints up until about ~2009, and then the number of closed complaints takes a deep dip in ~2011 only to shoot back up in ~2015. For the most part from 1985 until the mid 2000s there was a steady growth in complaints which might simply be increasing due to increases in population in NYC. Luckily, we have the NYC census data starting

Absolute and relative number of Complaints per borough

Absolute number of complaints

Now let's see what the complaindf_allegationst numbers per borough and the complaints per borough per person look. I expect the Bronx and Brooklyn to have high numbers of complaints as we saw above, but maybe their large populations will mean each borough has roughly the same number of complaints per population.

Now I need the sums of the number of complaints each year, because I need to divide the number of complaints each borough received in a year by the total for that year.

The number of complaints received does indeed reflect the populations of the five boroughs. Brooklyn receives the most complaints and is the most populous borough, whereas Staten Island receives the least complaints and is the least populous borough. Usually log plots help elucidate the difference between data by showing orders of magnitude, but there is a lot of overlap in the data.

The number of complaints closed does not seem to follow an appreciable pattern. We might have expected a similar curve to the complaints received, but it appears that complaints received in the boroughs almost get completed at random regardless of which borough you come from.

Relative number of complaints

Let's look at how the boroughs compare with percentages of complaints.

As we saw before the borough you come from has no effect on when the complaint is closed, it's basically a toss-up for when the complaint will be closed. As for the complaints received, we see something quite interesting. From earlier we know that Brooklyn ~30% of the NYC population, and yet they easily for almost every year we have data have over ~40% of the complaints received. We also know that proportionally Brooklyn has one of the largest black communities, right after the Bronx. We expect Queens to have ~27% of the complaints received but instead it falls below ~20% to sometimes even as low as ~10%. The Bronx has ~15% of the population and yet ~30% of the complaints, and is the borough with the largest black population. We see that Staten Island does have both the smallest population and the smallest number of complaints, and Manhattan also has ~20% of the population and ~20% of the complaints. It seems, however, that the more diverse the borough the more complaints received against the police, and in numbers that cannot be explained by just the total population growth of said borough.

Officers with most complaints

Let's look to see which officers have the most complaints.

First Officer

All of the complaints against Sbarra have been from either black individuals or hispanic individuals. It should be noted that he does work in Brooklyn. There are several board dispositions associated with the complaints. The decisions are as follows:

  1. Substantiated:
    1. The alleged conduct occurred and it violated the rules. The NYPD can choose to ignore those recommendations. It has discretion over what, if any, discipline is imposed.
  2. Exonerated:
    1. The alleged conduct occurred but did not violate the NYPD’s rules, which often give officers significant discretion over use of force.
  3. Unsubstantiated:
    1. The CCRB has fully investigated but could not affirmatively conclude both that the conduct occurred and that it broke the rules.

It seems that the majority of the complaints against Sbarra were either found to be Unsubstantiated or Exonerated. There were several that were substantiated and resulted in command discipline, and a good majority of Sbarra's complaints were for abuse of authority.

Second Officer

Reich has a much different history than Sbarra. Reich works in the Narcotics Borough of Staten Island and the majority of individuals filing incidents against Reich are white. The majority of the complaints against Reich were found to be unsubstantiated. Reich could be potentially dealing with a lot of individuals involved with narcotics in Staten Island, and that might explain the fact that white people are mostly filing incidents with him.

Officers with top 1% of complaints

It is possible that there is a trend of who files complaints against the cops that have tho most complaints. We will sort them by how many complaints each officer has, and take the top 1% of those.

Okay, we've got each officer's ID and the frequency of complaints against them. Let's look at the "top" officers, the ones that the most complaints. Because we're looking at the "top 5%" this will be most frequent 200 officers.

Okay, let's give these results a look.

As we can see most of the complaints come from black individuals, followed by hispanic individuals. This is true across all the boroughs, when we consider the top 200 officers with the most complaints. This is true even in boroughs that have proportionally higher white populations, such as Manhattan or Staten Island. On the other hand, the majority of these cops are white.

Conclusion

If there were no bias in targeting individuals of color we would expect the proportion of complaints to reflect the racial compositions of each borough. We would expect more black individuals to file complaints in the Bronx and Brooklyn simply because their populations are larger there, and that more white individuals would file in whiter boroughs such as Manhattan and Queens. We would also expect more Asian individuals to file complaints in Queens, which has such a large Asian community where compared proportionally to the other boroughs. However, this does not appear to be the case. Across the board we have seen that boroughs with more black individuals file more incidents against the NYPD than other boroughs, in percentages far greater than the borough's share of NYC's total population. In order to file an incident and complaint this means that you have had an encounter with the NYPD that was enough to make you file, meaning that in all boroughs, from what has been reported, black individuals have had more encounters with the NYPD than can be explained simply by increasing population values.